Continuous cholinergic-dopaminergic updating in the nucleus accumbens underlies approaches to reward-predicting cues

The ability to learn Pavlovian associations from environmental cues predicting positive outcomes is critical for survival, motivating adaptive behaviours. This cued-motivated behaviour depends on the nucleus accumbens (NAc). NAc output activity mediated by spiny projecting neurons (SPNs) is regulated by dopamine, but also by cholinergic interneurons (CINs), which can release acetylcholine and glutamate via the activity of the vesicular acetylcholine transporter (VAChT) or the vesicular glutamate transporter (VGLUT3), respectively. Here we investigated behavioural and neurochemical changes in mice performing a touchscreen Pavlovian approach task by recording dopamine, acetylcholine, and calcium dynamics from D1- and D2-SPNs using fibre photometry in control, VAChT or VGLUT3 mutant mice to understand how these signals cooperate in the service of approach behaviours toward reward-predicting cues. We reveal that NAc acetylcholine-dopaminergic signalling is continuously updated to regulate striatal output underlying the acquisition of Pavlovian approach learning toward reward-predicting cues.

The ability to learn Pavlovian associations from environmental cues predicting positive outcomes is critical for survival, motivating adaptive behaviours. This cued-motivated behaviour depends on the nucleus accumbens (NAc). NAc output activity mediated by spiny projecting neurons (SPNs) is regulated by dopamine, but also by cholinergic interneurons (CINs), which can release acetylcholine and glutamate via the activity of the vesicular acetylcholine transporter (VAChT) or the vesicular glutamate transporter (VGLUT3), respectively. Here we investigated behavioural and neurochemical changes in mice performing a touchscreen Pavlovian approach task by recording dopamine, acetylcholine, and calcium dynamics from D1-and D2-SPNs using fibre photometry in control, VAChT or VGLUT3 mutant mice to understand how these signals cooperate in the service of approach behaviours toward rewardpredicting cues. We reveal that NAc acetylcholine-dopaminergic signalling is continuously updated to regulate striatal output underlying the acquisition of Pavlovian approach learning toward reward-predicting cues.
The ability to learn to associate environmental cues with positive outcomes is critical for survival. These Pavlovian associations directly enable the acquisition of complex emotional and motivational states toward cues signalling food sources (i.e. rewards) 1 . These rewardpredicting cues can acquire motivational properties themselves, in the form of incentive salience, or 'wanting', which can manifest as seeking behaviours such as approaching the predictive positive environmental cue 2 . Importantly, substantial evidence from both human and animal studies supports the hypothesis that endophenotypes in addiction, schizophrenia, depression, Parkinson's disease, and other forms of psychopathological processes are the result of abnormal incentive salience processing 1 .
It has been suggested that NAc CINs encode motivational signals supporting approach or avoidance behaviours 29,30 . For example, microdialysis studies in rodents have shown that extracellular ACh levels in NAc increase during conditions that reduce reward-seeking behaviours such as satiety 31,32 , conditioned taste aversion 33 , anxiety-like and depression-like states 34,35 , and drug or sugar-binge withdrawal 32,[36][37][38][39] . In contrast, salient reward-predicting cues that encourage motivated behaviour have been shown to promote a characteristic 'pause' in CIN firing 8,9,13,40,41 . This CIN pause coincides with phasic DA activity during learning 11,13 , suggesting a CIN-DA gating mechanism regulating plasticity at corticostriatal synapses onto SPNs 22,42 .
Despite this emerging evidence for a role of both NAc DA and ACh in cue-motivated behaviours, little is known about the relationship and interaction between DA and CIN-mediated signalling during the acquisition of Pavlovian associations. Understanding this relationship in vivo is especially critical given the potential role of both DA and CINs in the need to constantly update cue-reward associations to optimise reward-mediated behaviours, and how its disruption contributes to neurodevelopmental, psychiatric, and neurodegenerative disorders [43][44][45] .
Another unanswered question regarding NAc reward learningrelated circuitry is whether the critical neurotransmitter released by CINs is, in fact, ACh. This is a relevant question because CINs have been shown to co-release glutamate, mediated by expression of the vesicular glutamate transporter 3 (VGLUT3) 46 . Indeed, activation of VGLUT3-mediated glutamate release from CINs can directly affect plasticity in SPNs 47,48 , regulating DA release 49 and addiction-related behaviours 50 . A reason this question remains unanswered is that experiments aimed at manipulating or recording CIN activity typically involve local lesioning, optogenetics, chemogenetics, calcium imaging, pharmacology and/or electrophysiology, all of which have been highly informative, but none of which can distinguish the role of ACh and glutamate released from CINs.
To investigate the interactions between CIN-released neurotransmitters and NAc circuitry, we used genetically-encoded sensors and fibre photometry to record millisecond dynamics of ACh, DA, and calcium in putative D1-and D2-SPNs, in the NAc of mice during acquisition of a task that measures Pavlovian approach behaviours to reward-signalling cues 51 . We found that highly coordinated DA-ACh signalling underlies reward prediction and reward collection. Mice with disrupted striatal VGLUT3 behaved normally. However, decreased levels of the vesicular acetylcholine transporter (VAChT) in striatal CINs, which significantly reduces ACh release from CINs 49 , abolished coordinated DA signalling and disrupted concurrent D1-and D2-SPN calcium activity and Pavlovian approach behaviours, which was rescued by restoring VAChT in the NAc. Our results reveal how balanced dopaminergic-cholinergic signalling in the NAc regulates striatal outputs in the service of updating cue-motivated learning in mice.

Results
Mice performing the Autoshaping task exhibit approach behaviours directed towards stimuli predicting rewards The Autoshaping task (Fig. 1) is a well-established Pavlovian behavioural paradigm for rodents that assesses the motivational and incentive salience properties of a rewarding unconditioned stimulus (US) and a neutral conditioned stimulus (CS) predicting rewards [51][52][53][54][55][56][57][58][59][60][61][62] . Briefly, repeated paired presentations of a CS anticipating rewards (CS+) can elicit conditioned responses including approaches toward the CS+, even though no response from the animal is required. This phenomenon is often referred to as sign-tracking. Presentation of a CS that is not associated with reward (CS-) leads to a decrease in approaches toward the CS-. Another type of conditioned response often observed in rodents is the development of approach behaviours toward the location of the US delivery (reward magazine) during the CS presentation, despite the rewards not being delivered until after the termination of the CS+ 63 . This phenomenon is often referred to as goal-tracking. We initially studied the behaviour of wild-type C57BL/6j mice (n=12♂, n=12♀) using the touchscreen-based Autoshaping task ( Supplementary Fig. 1). Both male and female mice learned the association between the CS+ and delivery of a strawberry milkshake reward (10 μl), evidenced by an increase in the time mice spent approaching the CS+, and a reduction in the time spent approaching the CS-(S1→S10, Supp. Fig. 1a-d). When the reward contingency was reversed (S11→S20, Supp. Fig. 1a), both male and Fig. 1 | The touchscreen Autoshaping task to assess Pavlovian approach behaviours toward reward-predicting stimuli. a Layout of the Autoshaping touchscreen operant chamber depicting the two screens (left, CS-; right, CS+) and the reward magazine (RM) delivering strawberry milkshake reward (10 μl). Each chamber was equipped with a back infrared photobeam (BIR) to initiate trials, and two front infrared photobeams (FIR) on each side of the RM to record approaches to the CS screen. An infrared photobeam inside the RM (not displayed) recorded latency time to collect rewards. b Flowchart overview of the Autoshaping task during acquisition (left) and reversal (right) training sessions. (left) Following a variable ITI, a trial initiated after breaking the BIR followed by the presentation of the stimulus (CS+ or CS−) during 10 s. Upon CS+ offset a reward was delivered and a new ITI began once the mouse pulled away from the RM. Upon CS− offset, no reward was delivered, and a new ITI started. Within a single session, CS+ and CS− trials alternated pseudo-randomly. In total, each session ended after 20 CS+ and 20 CS-trials or after 60 min, whichever occurred first. (right) Following 10 acquisition sessions (1 session/day), mice undergo a total of 10 reversal sessions, in which the location of the CS+ and CS-were reversed. c (left) In contrast to the previous, both CS screens (left and right) had 50% of probability to deliver rewards in nondeterministic trials. Contingencies after CS+ or CS-remained similar as previously described. Within a single session a total of 20 CS+ and CS-trials were presented. (right) After 10 consecutive non-deterministic training sessions, mice followed 10 consecutive deterministic training sessions as described in (b). Figure 1a was created with BioRender.com. female mice initially spent more time approaching the new CS-(former CS+), and then shifted after several sessions towards spending more time approaching the new CS+ (former CS-). No sex differences in approach behaviours to the CS were found (p > 0.05). The touchscreen-based Autoshaping task is designed to record approach behaviours toward the location of the CS. In the present study we also recorded nose-pokes to the reward magazine. We observed that both male and female mice showed little time nosepoking the reward magazine during the CS presentation (Supp. -Fig. 1e-f), indicating that the task set-up is effective in eliciting approach behaviours towards the location of the CS almost exclusively (i.e., sign-tracking). As there were no differences between male and female mice on approaches to CS+ or CS-, in subsequent experiments we combined males and females into a single group for analysis.
Nucleus accumbens dopamine dynamics correlate with approach behaviours in the Autoshaping task The formation of cue-reward associations depends on DA release in the NAc 59,64-66 . Here, we combined the recently developed GRAB DA2m 67 biosensor (hereafter GRAB DA ) with fibre photometry to characterise in vivo extracellular NAc DA dynamics in wild-type C57BL/6j mice (N = 8, n = 4♂, n = 4♀) learning the Autoshaping task ( Fig. 2a-c, Supp. Fig. 2a-e). In addition, we recorded DA in an independent cohort of C57BL/6j mice (N = 9, n = 5♂, n = 4♀) performing a 'non-deterministic' variation of the Autoshaping task in which, on a given trial, the location of the CS+/CS-was determined pseudorandomly (50% probability) ( Fig. 1c and Fig. 2a). In this version of the task mice were unable to predict which lit location (either left or right screen) was associated with reward delivery. After ten such sessions (S1→S10), mice underwent ten standard 'deterministic' sessions (S11→S20) in which the CS+/CS-location remained constant within and across sessions.
Using this combined approach, we obtained robust and reliable recordings of extracellular DA levels within the NAc (Fig. 2b and Supp. Fig. 2c) that progressively changed across training sessions ( Fig. 2c-f). We found that mice tethered for fibre photometry recordings behaved similarly to control mice without fibre optical implants during the Autoshaping task (p > 0.05, Fig. 2a and Supp. Fig. 1a), indicating no major effect of tethering or surgical implants.
DA dynamics were tightly coupled to approaches toward CS presentation ( Fig. 2g-left panel and Supp. Fig. 2d-g). Specifically, as mice learned the task during acquisition sessions (S1→S10), the amplitude of the DA response became consistently larger during presentation of the CS+ compared to presentation of the CS-. Such changes were not seen during non-deterministic contingencies. Interestingly, when the locations of the CS+ and CS-were first reversed (S11), a large increase in DA levels was observed during CSpresentation (former CS+) which did not change during the CS+ (former CS-). Finally, after five consecutive reversal sessions (S11→S15) the amplitude of DA response during stimulus presentation was larger during the CS+ compared to the CS-( Fig. 2g-left panel and Supp. Fig. 2f,g). Similarly, once mice performing non-deterministic contingencies began deterministic training contingencies (S11→S20), the DA response became significantly larger during CS+ trials.
Following CS+ offset, a phasic DA response was observed during reward delivery (Fig. 2g-right panel and Supp. Fig. 2h). Across the acquisition and reversal training sessions, the amplitude of this reward-evoked DA response progressively reduced as mice learned the association between CS+ and reward. In contrast, the amplitude of reward-evoked DA responses during the CS+ and CS-in nondeterministic contingencies remained constant across sessions (S1→S10), but when deterministic contingencies were established (S11→S20), the amplitude of the DA response significantly decreased across sessions. DA dynamics during CS+, CS-, and reward collection were closely correlated with approaches during the Autoshaping task (Fig. 2h, i). During acquisition and reversal sessions, the longer mice spent approaching the CS+ compared to the CS-, the larger the relative increase of NAc DA signal during CS+ trials (Fig. 2h, top panels). No correlation was observed on non-deterministic sessions (Fig. 2h, bottom panels). Also, the time mice spent approaching the CS+ was inversely correlated with the amplitude of reward-evoked DA responses (Fig. 2i, top panels), although this effect was not significant in the group of mice trained first in the non-deterministic contingencies (Fig. 2i, bottom panels). Together, our findings indicate that NAc DA dynamics using the Autoshaping task strongly correlate with approach behaviours and reward predictability, as previously demonstrated in other paradigms [68][69][70][71][72] .

Acetylcholine release from striatal cholinergic interneurons regulates the acquisition of approach behaviours
Within the NAc, CINs are proposed to play fundamental roles in modulating presynaptic DA release [18][19][20] , regulating the activity of local circuits 27,73 , and integrating environmental information to regulate behaviour [8][9][10][11][12][13][14] . However, NAc CINs also co-release glutamate 46 , and it remains unclear whether these functions depend mostly on ACh or glutamate release 74 . To disentangle the individual contributions from ACh or glutamate released from CINs in approach behaviours, we used two genetically modified mouse lines (VAChTcKO and VGLUT3cKO) 49,75 with selective knockout of the vesicular ACh transporter (VAChT, Fig. 3a) or the vesicular glutamate transporter (VGLUT3, Fig. 3b) in the striatum [46][47][48]75,76 . These two proteins are required for ACh or glutamate release from CINs, respectively. We found that VAChTcKO (VAChTcKO: N = 25, n = 11♂, n = 14♀; control: N = 24, n = 13♂, n = 11♀; Fig. 3c and Supp. Fig. 3), but not VGLUT3cKO mice (VGLUT3cKO: N = 23, n = 12♂, n = 11♀; control: N = 24, n = 12♂, n = 12♀; Fig. 3d and Supp. Fig. 4) failed to discriminate between the CS+ and CS-during acquisition sessions, demonstrated by their equal time spent approaching the CS+ and CS-during presentation ( Fig. 3c and Supp. Fig. 3b). Interestingly, VAChTcKOs spent more time approaching the CS+ compared to the CS-during late reversal sessions, suggesting that some basic learning ability is preserved. This discrimination ability was not demonstrated until the~17th session of training, indicating a severe learning impairment. This cannot be interpreted as intact reversal learning as these mice did not acquire the association initially, so for them there was no association to reverse. No sex differences were observed across genotypes when compared with their control littermates (p > 0.05).
Given this substantial behavioural impairment in VAChTcKOs, we next assessed whether in vivo ACh dynamics in the NAc changed during the acquisition of approach behaviours. We used GRAB ACh3.0 (hereafter ACh3.0) 77 injected within the NAc of an independent cohort of mice (VAChTcKO: N = 8, n = 4♂, n = 4♀; control: N = 7, n = 4♂, n = 3♀) to record rapid dynamic changes of extracellular ACh using fibre photometry during performance of the Autoshaping task ( Fig. 4a-d and Supp. Fig. 5a-d). During CS+ trials, we observed a significant decrease (~5-8s long) in ACh signalling across acquisition and reversal sessions after reward delivery in control littermate mice, but not in VAChTcKOs (Fig. 4b-e and Supp. Fig. 5e-g). Previous electrophysiological findings suggest that CIN pauses following a salient stimulus are critical to rapidly gate the influx of cortical inputs and synaptic plasticity onto SPNs to invigorate reward-predicting behaviours 8,9,13,40,41 . Consistently, our observations suggest that CINmediated pausing of tonic ACh release during rewards may underlie the development of approaches toward CS+ in mice. A previous report using microdialysis has shown that tonic striatal extracellular ACh levels in VAChTcKO are significantly reduced (~95%) 49 , which may limit the ability to detect decreased cholinergic signals using ACh3.0. It is therefore likely that cholinergic tone in VAChTcKO mice is so low that changes in CIN activity (such as pauses in activity) are unable to further modulate cholinergic tone. Additionally, a phasic increase (~1s) in ACh signal that did not differ between genotypes was observed during both CS+ and CS-onset (p > 0.05, Supp. Fig. 5h). We also observed a phasic ACh response during CS+ offset that was significantly impaired in VAChTcKOs (Supp. Fig. 5i). This event was not observed during CSoffset. We found a significant inverted relationship between ACh response and approaches to CS+ in control littermates ( Fig. 4f) but not in VAChTcKO mice (Fig. 4g) during acquisition training sessions. These results suggest that in mice with low levels of VAChT, presynapticmediated plasticity mechanisms regulating ACh release are severely impaired, disrupting Pavlovian approach behaviours.
CINs provide the primary source of ACh in the NAc 78 , but it has been reported that cholinergic neurons from the brainstem project to the striatum 79 to regulate local circuits underlying action strategies and cognitive flexibility 80 . Therefore, we investigated whether mice lacking 90% of VAChT expression from brainstem cholinergic neurons projecting to the striatum (En1-cre,VAChT flox/flox ) 81 might display deficits in the Autoshaping task. We found that both En1-cre,VAChT flox/flox (N = 17, n = 9♂, n = 8♀) and control littermate (N = 18, n = 9♂, n = 9♀) mice were able to learn to approach the CS+ across training sessions (Supp. Fig.6), suggesting that the release of ACh from brainstem neurons projecting to the striatum and other brain regions contributes little to Pavlovian approach behaviours. Taken together, our work demonstrates that ACh, but not glutamate released from CINs, or ACh release from brainstem cholinergic neurons, plays an important role in encoding the cue-motivated incentive salience underlying approach behaviours.

Conditional VAChTcKO mice display abnormal dopamine dynamics correlated with approach behaviours
Given our findings that mice with disrupted ACh release from CINs (VAChTcKOs) are unable to produce the approach behaviours present in their littermate counterparts ( Fig. 3c and Supp. Fig. 3), and that DA signalling in the NAc underlies these behaviours ( Fig. 2 and Supp. Fig.  2), we next tested whether DA signalling associated with cue-motivated approaches is affected in VAChTcKO mice. Previous experiments suggest that in general DA release should be decreased 49 , but the use of photometry and GRAB sensors allows for evaluation of how ACh contributes to millisecond updating of DA signals that underlie behaviour. We used GRAB DA 67 to record in vivo NAc DA dynamics in VAChTcKO (N = 7, n = 3♂, n = 4♀) and control littermate (N = 8, n = 4♂, n = 4♀) mice performing the Autoshaping task ( Fig. 5a, b). Consistent with earlier reports indicating that striatal CINs modulate presynaptic DA release [18][19][20][21]75 , and that VAChTcKO have DA deficits 49 , we found that DA response amplitude during CS+ trials was reduced in VAChTcKO mice when compared to controls (Fig. 5c). Importantly, differences in DA dynamics during lit CS+ and CS-were significantly larger at late acquisition and reversal sessions in controls but not in VAChTcKO mice (Fig. 5d, left panel and Supp. Fig. 7a). This was a result of the decreased DA signalling during CS+, but also to the inability of DA levels during CS-to decrease in VAChTcKO mice across sessions (Fig. 5b, bottomright heatmap). This abnormal DA signalling in VAChTcKO mice likely decreases signal to noise and contributes to the observed behavioural deficits. Similarly, the amplitude of DA responses during reward collection was blunted in VAChTcKO mice and did not change across sessions (Fig. 5d, middle panel and Supp. Fig. 7b).
The relative increase of DA response during CS presentation was correlated with approaches in both controls and VAChTcKO mice (Fig.  5d, right panel, and Fig. 5e), but with stronger correlation in controls (acquisition: R 2 = 0.92, reversal: R 2 = 0.83) compared to VAChTcKO mice (acquisition: R 2 = 0.56, reversal: R 2 = 0.58). Also, the DA response to reward was significantly correlated with approaches to the CS+ during presentation across all training sessions, but narrowly missed significance (p = 0.0540) in acquisition sessions in VAChTcKO mice ( Fig. 5f). Together, these findings suggest that despite VAChTcKOs exhibiting reduced NAc DA signalling during CS+ presentation and reward collection, they retain some ability to encode stimulus-reward associations, but to a lesser extent than control littermates (Fig. 5e, f), matching the late improvement in learning we observed ( Fig. 3c and Fig. 5d, right panel). The observation that the amplitude of the DA signalling during CS-remains constant in VAChTcKOs across sessions when compared to controls (Fig. 5b) may also contribute to decrease the ability of mutant mice to discriminate between CS contingencies. These observations highlight the critical and likely constant updating of DA-ACh signals underlying reward-mediated behaviours, and indicate that cholinergic dysfunction leads to more subtle changes than merely decreasing DA, as previously suggested 49 .
Dysfunctional cholinergic signalling in the striatum drives abnormal direct and indirect spiny projecting neuron calcium dynamics Previous reports have indicated that DA and ACh within the striatum often work in concert to regulate the activity and synaptic plasticity of SPNs from the direct and indirect pathways 20,23,24,26,42,82 . Indeed, evidence suggests concurrent dynamics in both SPN pathways regulate movement initiation, action selection, and/or behavioural reinforcement 3,83-86 . Importantly, it is suggested that altered DA-ACh balance may interfere with the coordinated activity of both SPN pathways and contributes to various neuropathologies including addiction and Parkinson disease 3,84,86 . VAChTcKO mice exhibit deficits in both ACh (Fig. 4) and DA dynamics (Fig. 5), underlying the close relationship between these two neurotransmitters. Thus, to understand the association between NAc ACh, DA, SPNs, and behaviour, we simultaneously studied the calcium activity of putative D1-and D2-SPNs during the acquisition of cue-motivated approach behaviours in the Autoshaping task, in both VAChTcKO and control mice.
D1-SPNs was achieved by co-injection of a Cre-Off AAV within the same mice. Given that 95% of striatal cells are SPNs, and previous reports using this approach have demonstrated that fluorescence arising from interneurons is minimal 87,91,92 , we assigned signals generated by jRCaMP1a to the indirect D2-SPN pathway and by GCaMP6s to the direct D1-SPN pathway. Finally, considering recent observations by Legaria et al. 93 indicating that calcium dynamics recorded from striatal SPNs may not reflect spiking-related events but instead may be nonsomatic (dendritic) changes, we interpreted our calcium recordings as likely arising from dendritic neuronal sub-structures.
Similar to wild-type C57BL/6j ( Fig. 2a and Supp. Fig. 1a), VAChT flox/flox (Fig. 3c, Supp. Fig. 3a, Supp. Fig. 5d and Supp. Fig. 6a) and VGLUT3 flox/flox ( Fig. 3d and Supp. Fig. 4a), control (D2-Cre) mice performing the Autoshaping task spent more time approaching the CS+ than the CS-across acquisition and reversal sessions (Fig. 6c), whereas VAChTcKO mice showed impaired approach behaviours toward the stimuli, reproducing data in (Fig. 3c and Fig. 5d). The calcium activity of putative D1-SPNs was characterised by multiphasic events during CS+ presentation and reward delivery (Fig. 6d), but a monophasic event in D2-SPNs during reward delivery across training sessions (Fig. 6e). These D1-and D2-SPN calcium events were severely disrupted in VAChTcKOs ( Fig. 6d-f). Regarding D1-SPNs, we first observed during the CS+ and CS-onset a phasic calcium increase across all training sessions in both control and VAChTcKO mice (Fig.  6g). In VAChTcKOs, this event was significantly larger during the first two acquisition sessions in the CS+ compared to the CS-. Second, the calcium signal amplitude significantly reduced as approaches toward the CS+ increased in controls but not in VAChTcKOs (Fig. 6h, top panels). Finally, following reward delivery the calcium signal in control mice was characterised by a bi-phasic burst (Fig. 6i) and pause event (Fig. 6j, top-left panel) across acquisition and reversal sessions. The amplitude of the phasic (burst) calcium increase was larger in control than VAChTcKO mice. Interestingly, we found the amplitude of the pause mechanism after reward delivery significantly increased as mice spent more time approaching the CS+ (Figs. 6c and 6j). In contrast, despite VAChTcKO mice also showing a bi-phasic burstpause response in D1-SPNs after delivery of rewards, the amplitude of the pause mechanisms was significantly reduced when compared to control mice. Together, our findings suggest that the calcium activity of putative D1-SPNs during the CS+ presentation and reward delivery progressively decreased as reward predictability increased during the Autoshaping task. Moreover, the activity of D2-SPNs in control mice significantly increased after reward delivery (Fig. 6j, bottom-left  panel). Surprisingly, we found this reward-evoked activity instead decreased in VAChTcKO mice. Also, although no phasic D2-SPN activity was observed after CS-offset in either control or VAChTcKO mice (Figs. 6e and 6j, bottom-right panel), the calcium signalling amplitude was reduced in VAChTcKOs when compared to control mice across all training sessions. Our findings suggest that an adequate balance of DA-ACh within the NAc is critical for regulation of the coordinated calcium activity of the direct and indirect SPN pathways underlying cue-motivated approach behaviours 28,94 .

Acetylcholine released from nucleus accumbens cholinergic interneurons is necessary to regulate approach behaviours
Previous reports have highlighted that neurons from the NAc 54,58-61,66 , but not the dorsal striatum 65,95 , encode the acquisition of Pavlovian approach behaviours 54,[58][59][60][61]66 . It has also been suggested, however, that lesions in the dorsal striatum of rats facilitate responses to the food reward magazine 60 and contribute to incentive salience 96 .
Although NAc ACh may be required for approach behaviour, VAChTcKO mice have reduced ACh release in the dorsal striatum as well 49,76 , which might contribute to the behavioural deficits we observed. Moreover, the behavioural disruption may reflect developmental adaptations of affected circuits due to genetic inactivation of VAChT early in development. To test these possibilities, we rescued the ability of CINs in adult VAChTcKOs to release ACh by local injection of an AAV-VAChT within the NAc (N = 10, n = 5♂, n = 5♀; Fig. 7a-d). Additionally, we co-injected AAV-ACh3.0 in the same group of mice to monitor ACh dynamics during the performance of the Autoshaping task. Alternatively, VAChTcKO mice co-injected with AAV-mCherry and AAV-ACh3.0 (N = 11, n = 5♂, n = 6♀) were used as negative controls (sham). We found that relative to the sham control group, mice with rescued expression of VAChT in the NAc approached the CS+ more than the CS- (Fig. 7e), similar to control littermate mice (p > 0.05). Moreover, consistent with the notion that NAc CIN pauses are necessary for the processing of incentive salience and synaptic plasticity 9,11,13,27,73,97 , VAChT-rescued mice showed significantly decreased ACh signalling across acquisition and reversal sessions after reward delivery (Fig.7f-h). This pause event was not observed in sham VAChTcKOs, consistent with our earlier experiments (Fig.4). Our findings suggest that the observed behavioural deficits in VAChTcKO mice are due to altered local circuitry mechanisms in the NAc specifically. These deficits can be rescued during adulthood by restoring the potential of CINs to generate a brief ACh salience-evoked pause response, most likely by maintaining 'optimal' cholinergic tone within the region.

Discussion
Using a combination of automated touchscreen testing, fibre photometry, genetically-encoded sensors, and genetic mouse lines with deletion of VAChT or VGLUT3 transporters in striatal CINs, we revealed the dynamics of dopaminergic-cholinergic signalling underlying cuemotivated approach behaviours. Specifically, we report that a constant interaction and updating between ACh and DA signalling are critical to coordinate circuit mechanisms regulating the calcium activity of the direct and indirect SPNs underlying approaches to reward-predicting cues. Moreover, we demonstrate that interfering with ACh release from CINs alters the balance of DA-ACh dynamics and disrupts the activity of both SPN pathways, leading to a profound impairment in learning associations between CS+ and rewards, reflected as impaired development of approach behaviours toward CS locations during the Autoshaping task. Notably, we also show that restoring the expression of VAChT in CINs selectively in the NAc rescued ACh dynamics and approach behaviours. These results provide direct evidence that ACh released from NAc CINs, but not glutamate, plays a prominent role in the transference of incentive salience from rewards toward environmental stimuli predicting rewards.
Despite electrophysiological evidence suggesting the role of CIN pauses in the regulation of striatal circuitry function and behaviour, it has only been with the recent development of genetically-encoded biosensors 105 that in vivo recordings of extracellular ACh with subsecond resolution have been achieved 77 . This is particularly important considering that tonically-active CINs also co-release glutamate, and in vivo electrophysiological methods cannot distinguish the Article https://doi.org/10.1038/s41467-022-35601-x consequences of ACh and glutamate release. Consistent with a previous report using a Go/No-Go task in mice 102 , we observed that NAc ACh dynamics in control mice performing the Autoshaping task are mainly characterised by a transient reward-evoked decrease in cholinergic tone, likely reflecting the pause of activity of CINs electrophysiologically detected in vivo. This profound decrease of cholinergic signals correlates with the acquisition of approach behaviours, suggesting a relevant gating mechanism necessary for the acquisition or maintenance of cue-motivated learning behaviours. Supporting this hypothesis, we report that mice with disrupted ACh release (VAChTcKO) from CINs are profoundly impaired in learning to approach CS predicting rewards. Mice with disrupted glutamate release (VGLUT3cKO) from CINs perform similarly to controls. Reduced VAChT expression decreases ACh release to levels below the detection limit of microdialysis 49 . Because striatal baseline ACh levels are very low in VAChTcKO, CIN pauses in these mutants likely lack the potential to neuromodulate local plasticity mechanisms within the NAc. Thus, our fibre photometry observations in control mice strongly suggest that the reward-evoked ACh dynamics correlate with findings using in vivo electrophysiological approaches 8,9,40 . Conversely, we suggest that the anomalous endophenotype in VAChTcKO mice results from disturbed ACh storage from cholinergic synapses leading to a blunted vesicular ACh release from CINs 76,106,107 , that reduces ACh signal-to-noise ratio. Together, these data strongly suggest that ACh released from CINs plays critical roles in the neuromodulation of striatal circuits underlying cue-motivated learning behaviours.
The potential of individuals to transfer the reinforcing and motivational properties of rewards toward environmental stimuli seems to depend on DA within the NAc 108 . For example, mesolimbic DA depletion with 6-hydroxydopamine 66 or DA receptor antagonism within the NAc 109 impairs both acquisition and performance of appetitive Pavlovian approach behaviour. Consistent with this idea, we demonstrated that approach behaviours during the Autoshaping task robustly correlate with rapid increases in NAc DA signalling across acquisition and reversal training sessions, but inversely correlate during reward collection. Furthermore, our findings suggest DA signalling in NAc encodes the level of certainty with which mice can predict rewards. Thus, during the Autoshaping non-deterministic contingencies in which the probability of receiving rewards from each CS is 50%, DA signalling does not correlate with approach behaviours during CS+ presentation or reward delivery. The observation that DA signalling in VAChTcKO mice can still weakly correlate with approach behaviours when compared to controls suggest deficits in transferring the motivational incentive salience of rewards toward approach behaviours, yet mice are still able, at least to a certain degree, to associate the presentation of CS+ with rewards (however, this phenotype does not manifest as approach behaviours towards the location of CS). Supporting this hypothesis, previous reports have shown that striatal VAChTcKO mice are able to learn complex contingencies leading to rewards when performing training-intensive touchscreen-based behavioural tasks such as the heterogeneous sequence task, the pairwise visual discrimination task, and the 5-choice serial reaction time task (5-CSRTT) 49,75 . Finally, we observed a~1s long DA response at the onset of the CS presentation, even to the CS- (Fig. 2e, f). Several lines of evidence suggest that in addition to the reward prediction error, DA responses are also observed during arousing sensory and/or novel events [110][111][112][113] . However perhaps more likely is that this is a conditioned response that briefly generalises. Both CS+ and CS-are similar stimuli (large bright rectangles) that differ only in their spatial location. It is perhaps not surprising that following conditioning, when a large bright stimulus appears, there is a generalised DA response even to the CS-, which is rapidly curtailed once the system identifies the stimulus as the CS-. This explanation is similar to the idea that two sensory systems pass information to reward circuitry: a "low road", which provides rapid but low-resolution information, and a "high road" that provides high resolution information that becomes available following a brief delay [114][115][116] .
Within the striatum, the integration and output of information to the rest of the basal ganglia relies on the activity of GABAergic SPNs, which constitute as much as 95% of the entire neuronal population within the region 3 . SPNs are divided into two equally-sized and molecularly distinct subpopulations segregated by their output projection pathways through the basal ganglia. SPNs of the direct pathway express G s/olf -coupled D1 DA receptors (D1-SPNs) whereas SPNs of the indirect pathway express G i/o -coupled D2 dopamine receptors (D2-SPNs) 94,117,118 . Additionally, both SPN subpopulations express cholinergic G q -coupled M1 and G i -coupled M4 receptors, with M4 being more abundant on D1-SPNs 28 . Although still controversial 3,83,86,99 , a recent model proposes that the activity of the direct and indirect SPN pathways 'compete' to determine the animal's behavioural response, via modulating synaptic plasticity at inputs onto SPNs 83 . In this work, by recording calcium dynamics simultaneously from both SPN subpopulations 87,92 , we observed that in control mice, the calcium activity of D1-SPNs was characterised by biphasic events during CS presentation and reward delivery, while D2-SPN activity manifested as a single reward-evoked event. Previous reports 119,120 using a combination of electrophysiological recordings and optogenetic manipulation partially agree with our findings indicating that during Pavlovian conditioning tasks, D1-SPNs from dorsomedial striatum increase their activity as a function of reward value, while activity of D2-SPNs is reduced. However here we report, to our knowledge for the first time, that the rapid reward-evoked increase in D1-SPN calcium activity is followed by a pause response that opposes an increase of calcium in D2-SPNs, suggesting that during the Autoshaping task, the concurrent calcium dynamics of the direct and indirect SPN pathways are mutually necessary to encode the acquisition and maintenance of approach behaviours 60,121 . Consistent with this idea, we observed that in contrast to control mice, VAChTcKOs showed abnormal calcium activity from deficits in acetylcholine release show abnormal dopamine dynamics in nucleus accumbens. a Schematic brain sections depicting location of fibre stub tips implanted within the nucleus accumbens of control littermate (topblack bar) and VAChTcKO (bottom-blue bar) mice. b Heatmaps illustrating trial average DA signal (z-score) from acquisition (Acq, S1→S10) and reversal (Rev, S11→S20) sessions (CS+, left panels; CS-, right panels). Bar indicates the CS presentation (10 s), arrow bar indicates reward delivery. c (Left panel) Averaged DA signal (z-score) from CS+ (red) and CS-(blue) trials during acquisition sessions (S1, S10) in control littermate and VAChTcKO mice. Bar indicates CS presentation (10 s) and arrow bar the reward delivery. (right panel) Mean DA signal during CS+ (red bars) and CS-(blue bars) presentation at S1 and S10 acquisition sessions in control and VAChTcKO mice (one-way ANOVA, F(7,52) = 4.292, p = 0.0008). Scattered data points represent individual mice. d (Left panel) Mean DA signal amplitude (Δ) during CS presentation between control littermate and VAChTcKO mice (two-way RM-ANOVA SessionXGenotype interaction, Acq: F(9,117)=2.400, p=0.0156; Rev: F(9,117) = 2.699, p = 0.0069). (middle panel) Area under the curve (AUC) of DA signal during reward delivery in control littermate and VAChTcKO mice (two-way RM-ANOVA SessionXGenotype interaction, Acq: F(9,117) = 6.758, p < 0.0001; Rev: F(9,117) = 2.843, p = 0.0046). (right panel) Relative time (Δ) mice approached the CS during presentation. In contrast to controls (blank circles), VAChTcKO mice (blue circles) did not discriminate between CS stimuli across sessions (two-way RM-ANOVA SessionXGenotype interaction, Acq: F(9,117) = 4.399, p < 0.0001; Rev: F(9,117) = 3.888, p = 0.0002). e Correlation analysis between the mean DA signal (Δ) during CS presentation, and the time mice spent approaching the CS stimuli (Δ). f Correlation analysis between the DA response during reward collection (AUC) and time mice spent approaching the CS+. A total of N = 8 (n = 4♂, n = 3♀) control littermate mice and N = 7 (n = 3♂, n = 4♀) VAChTcKO mice were used. Post-hoc Tukey's test: ***p < 0.0001, **p < 0.001, *p < 0.05. No adjustments were made for multiple comparison analyses. Data are presented as the mean ± SEM. the direct and indirect SPN pathways, that likely underlies the observed deficits in learning the associations between CS and rewards. Previous reports have suggested that the activity of SPNs during stimuli conferring incentive salience heavily relies on the co-occurrence of rapid increases in DA release and cessations of ACh release 3,122 , but also on the differential expression of dopaminergic and cholinergic receptors among D1-SPNs and D2-SPNs 28,94 . For example, recent work demonstrated that the inhibition of D1-SPNs mediated by M4 receptors is indirectly regulated by the modulation of D2 receptors expressed in CINs 23 . It is plausible that the observed calcium hyperactivity in D1-SPNs of VAChTcKOs may be due to a reduced signalling of M4 receptors expressed within this subpopulation of neurons. In contrast, hypoactivity of D2-SPNs may be the result of a reduced activation of M1 receptors expressed in D2-SPNs. It is important to highlight that a recent elegant work from Legaria et al. 93 demonstrated that the striatal SPN calcium fibre photometry signal may reflect not only spiking dynamics, and instead much of the signal may arise from the dense dendritic arborisation of neurons. Therefore, following the suggestions of Legaria et al. (2022), we interpret the observed SPN calcium signals, regulated by continuous updating of ACh and DA, as possibly reflecting a dendritic 'eligibility trace'. The eligibility trace is posited by emerging theories of synaptic plasticity as a kind of flag, set at the synapse by the co-activation of pre-and postsynaptic neurons, that leads to weight change in a susceptible synapse only if an additional factor such as novelty, punishment, or reward is present 123 . Moreover, this additional factor is often implemented by the phasic activity of neuromodulators such as DA and ACh [124][125][126] . Therefore, our observations may reflect a mechanism of eligibility trace for synaptic plasticity and behavioural conditioning 124,125,127 , mediated by a continuous updating of ACh and DA dynamics, and triggered by behaviourally relevant stimuli. Future work is needed to address how the heterogeneous contribution of dopaminergic and muscarinic receptors expressed in D1-and D2-SPNs may regulate the shape, volume, and stability of dendritic spines 128 , and how this could influence changes in synaptic plasticity mechanisms regulating behaviour.
The use of transgenic mice chronically affecting CIN function is a valuable tool for understanding relevant endophenotypes associated with brain disorders, and specifically in this case to separate the contributions of VAChT or VGLUT3-mediated neurotransmitter release for behaviour. An important issue is that developmental compensatory mechanisms often hinder the interpretation of how acute factors affect the release of ACh and/or glutamate from CINs and how behaviours are related to effects arising from chronic manipulations. For example, the expression of VAChT in the cortex is reduced by 50% in D2-Cre mice 49 , which could potentially contribute to the observed behavioural phenotypes in VAChTcKO mice. Moreover, although our findings support the idea that the NAc 54,58-61,66 circuitry is critical for the acquisition of Pavlovian approach behaviours (but 65,95 ), others suggest that contributions from the dorsal striatum may facilitate incentive salience 96 and responses to collect rewards 55,62 . Because the reexpression of VAChT within the NAc of adult VAChTcKO mice restored reward-evoked decreased cholinergic tone (i.e. ACh pauses) and approach behaviours comparable to control littermate mice during the Autoshaping task, it seems likely that the maturation of striatal network activity mechanisms underlying cue-motivated approach behaviours does not require ACh released from CINs 129 . Furthermore, our evidence strongly indicates that ACh, but not glutamate, released from CINs within the NAc, is necessary for the regulation of approach learning behaviours. Finally, although our study does not directly demonstrate that inserting VAChT into the NAc of VAChTcKO mice increases extracellular ACh baseline levels, it is important to highlight that in the striatum, only cholinergic interneurons require VAChT to transfer ACh from the cytoplasm into synaptic vesicles 130 . Currently, there is no other function for VAChT that we are aware of. Taken together, our work suggests that an intricate balance between DA-ACh, and its reciprocal rapid updating during learning, is critical for the regulation of local network mechanisms and neuronal engrams underlying approaches to reward-predicting cues. Our observations shed new light on DA-ACh balance 122 which has been proposed as an aetiological mechanism underlying a variety of brain-related disorders including addiction, anxiety, obsessive-compulsive disorders, schizophrenia and Parkinson's disease.

Experimental design
Experiments were performed on 3-to 7-month-old male and female mice. Unless otherwise indicated, animals were housed in groups of two to four per cage at 22-23°C, 50 ± 10% humidity, with a 12:12h reverse light-dark cycle. Food and water were provided ad libitum until behavioural testing, at which point mice were mildly food restricted (90-95% of their original body weight) to increase their motivation to perform a behavioural task. Experiments were performed during the dark cycle (between 9:00 a.m. and 6:00 p.m.).
To characterise how the co-transmission of acetylcholine and glutamate from striatal CINs, or cholinergic pedunculopontine/laterodorsal tegmental nuclei neurons projecting to the striatum impacts on Pavlovian approach associative learning behaviours, independent cohorts of wild-type C57BL/6j mice, or mutant mice and their corresponding control littermates were used to perform an automated touchscreen Autoshaping task (Fig. 1). Separate cohorts of wild-type and mutant mice were used for fibre photometry experiments and behavioural testing. Mice with head implants were single housed to prevent infection of surrounding incision area or damage to the implant.

Touchscreen Autoshaping task
Experiments were conducted using automated Bussey-Saksida Mouse Touchscreen Systems Model 80614-20 (Lafayette Instruments, Lafayette, IN). The touchscreen Autoshaping task has been previously described 51,52 . Experiments were carried out inside sound-attenuating cabinets consisting each in a standard operant chamber and a touchscreen 12.1-inch monitor. The operant chamber was trapezoidalshaped constructed from three black Plexiglas walls, which open to the touchscreen (Dimensions: 20 cm x 18 cm screen-reward tray x 24 cm at screen) (Fig. 1a). The ceiling of the chamber was made of clear Plexiglas and the floor of perforated stainless steel with a waste tray situated below. The chamber was equipped with a liquid reward dispensing magazine located centrally in front of the touchscreen and linked to a liquid reward dispenser pump (strawberry milkshake, Neilson Dairy). A light emitting diode illuminated the food magazine during reward delivery. Computer graphic white square stimuli were presented on the touchscreen at either left or right side of the reward magazine. A miniature infrared camera was installed above the chamber to allow monitoring of the animals' behaviour. Animal activity was recorded via infrared photobeams located in front of each side of the screen (approaches), entries to the reward magazine (reward collection latency) and opposite side of the screen (trial initiation) (Fig. 1a). Schedule design, control of the apparatus via Whisker control system, and data collection used ABET II Video Touch software V21.02.26.
The Autoshaping task consisted of two pre-training phases followed by ten consecutive acquisition sessions and ten reversal sessions (one session per day, seven days a week). The first pre-training phase consisted of a unique session in which the animal habituated to the operant chamber by remaining inside of it for 30 min. No action was triggered regardless of mouse's behavioural status. The second pre-training phase consisted of at least two consecutive sessions (30 min-long each) in which reward (~7 μl) was delivered after a variable ITI (0-30 s; additional time allowed if necessary to ensure animal is not in magazine when ITI ends), with the magazine illuminated and a tone (1 s, 3 kHz, 80 dB) emitted upon delivery. The animal must enter the magazine to collect the reward (upon which the magazine light extinguished) to initiate the next delay period. Criteria was reached when the animal collected at least 50 rewards during the session. Animals not able to reach criteria after four consecutive pre-training sessions were excluded from the study. In total, 3 VGLUT3cKO (n = 2♂, n = 1♀), 1♀ control VGLUT3, and 1♂ VAChTcKO were excluded.
On the day after pretraining, animals were trained to associate the presentation (10 s) of a conditioned stimuli (CS) with the delivery of 10 μl of strawberry milkshake reward pumped into the central magazine. During a trial, a stimulus on one side of the screen (e.g., right) was designated to anticipate the delivery of a reward (CS+), while the opposite side screen (e.g., left) did not lead to reward contingencies (CS−). The location of CS+ and CS− were counterbalanced across mice but once designated, they remained constant across consecutive trials at least otherwise indicated. A single CS contingency was presented per trial. After a variable inter-trial interval (ITI, 45-90 s), the mouse initiated a trial by breaking the back infrared beam (BIR) within the chamber (Fig. 1b). During CS onset, a click tone (0.2 s, 2 kHz, 80 dB) Fig. 7 | Deficits in approach behaviours in VAChTcKO mice are rescued after expressing VAChT within the nucleus accumbens. a VAChTcKO (Rescued) mice received bilateral nucleus accumbens injections of 1:1 AAV-VAChT.mCherry (red) and AAV-ACh3.0 (green). This approach allowed to rescue the expression of VAChT within the nucleus accumbens of VAChTcKO mice and simultaneously record ACh dynamics during the Autoshaping task using fibre photometry. Alternatively, an independent group of VAChTcKO (Sham) mice received injections of a 1:1 AAV-mCherry and AAV-ACh3.0. b Western blot analysis demonstrated that VAChTcKO mice receiving AAV-VAChT.mCherry injections (N = 4 mice), but not AAV-mCherry injections (N = 4 mice), showed immunoreactivity for VAChT protein expression within the nucleus accumbens. Immunoreactivity for mCherry was found in both treatments. Expression of actin and synaptophysin (Syn) were used as a protein loading control. c Schematic brain sections from VAChTcKO mice depicting AAV injection site of ACh3.0+VAChT (red circles), ACh3.0+mCherry (green circles) and tip of fibre optic tracks (black bars). d Representative immunostaining from a VAChTcKO-rescued mouse receiving bilateral injections of AAV-ACh3.0 + AAV-VAChT.mCherry. Immunoreactivity for ACh3.0 (green) and VAChT (red) was observed within the nucleus accumbens in both hemispheres. Nuclei were stained with Hoechst (blue). The tip of the fibre optic track was also located within the nucleus accumbens. GFP and mCherry immunoreactivity was reproduced in all mice tested in this study (see below). for Scale bar-1000μm. e (Left panels) VAChTcKO-rescued but VAChTcKO-Sham mice spent more time approaching the CS+ across sessions (Rescued, two-way RM-ANOVA SessionXCS interaction, Acq: F(9,162)=3.073, p = 0.0020; Rev: F(9,162) = 5.304, p < 0.0001. Sham, Acq: p > 0.05; Rev: F(9,180) = 2.224, p = 0.0225), similar as control littermate mice (Control, twoway RM-ANOVA SessionXCS interaction, Acq: F(9,108) = 6.398, p < 0.0001; Rev: F(9,108) = 3.027, p = 0.0029). Control mice shown here correspond to experiment of Supp. Fig. 5d. (right panel) Similar as controls (blank circles, p > 0.05), VAChTcKO-rescued mice (orange circles) spent more time (Δ) approaching the reward-predicting CS+, while the VAChTcKO-Sham mice (blue circles) did not discriminate both CS stimuli (two-way RM-ANOVA SessionXTreatment interaction, Acq: F(18,225) = 2.749, p = 0.0003; Rev: p > 0.05). Compared to controls, VAChTcKO-Sham mice were impaired during acquisition sessions (two-way RM-ANOVA SessionXTreatment interaction, Acq: F(9,144) = 4.646, p < 0.0001; Rev: p > 0.05). f Heatmaps illustrating trial average ACh dynamics (z-score) across sessions. Bar indicates CS presentation and arrow bar reward delivery. g (Left) Trial average ACh signal (z-score) during the CS+ (red) and CS-(blue) at early (S1) and late (S10) acquisition sessions in VAChTcKO-Rescued and VAChTcKO-Sham mice. was generated to maximise the probability that the animal will be able to see both sides of the screen upon stimulus presentation and minimise inadvertent stimulus approaches. Upon CS+ offset a tone (1 s, 3 kHz, 80 dB) was emitted, a reward delivered to the magazine, and a light inside the reward magazine illuminated until the first nose poke for reward collection was registered via a light infrared beam located inside the reward magazine. Upon CS− offset, no reward was delivered and no tone, or light inside the reward magazine was generated. Following CS offset (and, if reward was delivered, entry into the magazine for reward collection), a new variable ITI began. The house light remained off throughout the session. A full session consisted in 40 trials, including 20 presentations of each CS contingency delivered in a pseudorandom order that no more than two similar CS trials were repeated consecutively. Sessions ended by completing 40 trials or 60 min, whichever reached first. Mice were trained 1 session per day. In total, mice underwent 10 consecutive acquisition sessions, followed by 10 reversal sessions, in which the pre-determined location of CS+ and CS-trials for each animal was reversed (CS+ becomes CS−, and CS− becomes CS+). It is anticipated that by reversing the contingency of the task, animals must adapt their reward prediction behavioural performance accordingly.
To evaluate the Pavlovian nature of the task, an independent cohort of wild-type C57BL/6j mice underwent 10 non-deterministic acquisition sessions, in which each side of the screen had 50% probability to be either CS+ or CS− (Fig. 1c). A total of 40 trials (20 CS+ and 20 CS−) presentations within 60 min were delivered per session. Under this non-deterministic contingency, animals were unable to predict what stimulus (left or right screen) anticipated the delivery of a reward. After the completion of the non-deterministic acquisition sessions, animals were trained for another 10 sessions with deterministic contingencies as previously described in Fig. 1b.
The primary performance measure in this task is the time mice spent in front of the CS+ and CS− screens. However, visits to the reward magazine (latency time to collect rewards), and latency and number of touches to the CS+ and CS− screen were also recorded.

Viral vectors
For experiments to record extracellular dopamine or acetylcholine using fibre photometry, expression of GRAB DA2m 67

Immunohistochemistry
Mice were anaesthetised with ketamine (100 mg.kg -1 )-xylazine (20 mg.kg -1 ) and then transcardially perfused with ice-cold phosphatebuffered saline (PBS) followed by 4% paraformaldehyde (PFA). Brains were kept overnight in 4% PFA and then transferred into a PBS-azide solution, and a vibratome was used to cut 40 μm sections. After slicing, free-floating sections were rinsed with PBS and incubated in Trisbuffered saline (TBS) containing 1.2% Triton X-100 for 20 min. The sections were rinsed with TBS and blocked for 1 h in TBS containing 5% (v/v) normal goat serum at room temperature. After blocking, sections were rinsed twice with TBS and then incubated overnight at 4°C with chicken anti-GFP (Abcam, ab13970 1:500) and rabbit anti-mCherry (Abcam, ab167453, 1:200) in TBS containing 0.2% Triton X-100 and 2% normal goat serum. The following day after~18 h incubation with the primary antibodies, sections were washed twice for 10 min each in TBS and then incubated for 1 h at room temperature with Alexa 488 goat anti-chicken (Thermo Fisher, A11039, 1:500) and Alexa 633 goat antirabbit (Thermo Fisher, A21070, 1:500) antibodies in TBS 0.2% Triton X-100 and 2% normal goat serum. The sections were washed twice in TBS for 10 min and then incubated with Hoechst 33342 (Thermo Fisher H3570, 1:1000) to counterstain the nuclei. Images were captured using the Leica DM6B Thunder imager (Leica Microsystems Inc.) Surgical procedures and fibre photometry Viral infusions and optic fibre implants were carried out as previously described 75 . Briefly, mice were anaesthetised with 5% isoflurane induction rate and placed in a stereotaxic frame, after which anaesthesia was maintained at 1.5-3%. A heating pad was placed under the mice to maintain body temperature (37°C). The top of the skull was exposed, and holes were drilled for viral infusion needle, optic fibre implant, and two skull screws. Viral injections aiming the NAc were made using a microsyringe pump (0.5 μl, 0.1 μl/min) at the following coordinates from Bregma (AP: 1.8 mm, ML: 0.5 mm, DV: 4.0 mm) 137 . Injectors were left in place for 5 min and then slowly removed. Only mice receiving AAV9-eSyn.mCherry-2A-mSLC18A3-WPRE or AAV9-hSyn-mCherry were injected bilaterally, otherwise, counterbalanced unilateral viral injections were performed. Low-auto-fluorescence optic fibre implants (400 μm O.D, 0.48 NA, 5 mm-long, Neurophotometrics, San Diego, CA) were unilaterally inserted just above the injection site. Prior to experimentation, mice underwent a 3-weeks recovery period followed by food restriction (90-95% of their postrecovery body weight) for at least another two extra weeks. Mice were first allowed to adapt to the touchscreen chamber and fibre patch-cord during the touchscreen pretraining sessions (see above). To record fluorescence signals, the photometry system was equipped with a fluorescent mini-cube (Doric Lenses, Quebec, Canada) to transmit sinusoidal 465 nm LED light modulated at 572 Hz and a 405 nm LED light modulated at 209 Hz. LED power was set at~25 μW. Fluorescence was collected through the path-cord connected to the optic fibre implant of each mouse and transmitted back to the minicube, amplified, and focused into an integrated high sensitivity photoreceiver (Doric Lenses). Alternatively, for fibre photometry experiments requiring dual calcium recordings, a fluorescent mini-cube (Doric Lenses) able to transmit sinusoidal 465 nm LED (modulated at 572 Hz), 560 nm LED (modulated at 334 Hz), and a 405 nm LED (modulated at 209 Hz) was used. Fluorescent real-time signal was sampled at 12 kHz and then demodulated and decimated to 100 Hz using Doric Studio software V5.2.2.3 (Doric Lenses). The occurrence of behavioural manipulations including CS onset and reward delivery was recorded by the same system via TTL inputs from ABET II (Lafayette Instruments).

Fibre photometry analysis
Analysis of the signal was done with a custom-written Phyton software available at Mousebytes (https://mousebytes.ca/comp-edit? repolinkguid=ccf27660-6442-4c90-a14c-cdbc663a7b72). Fluorescence signal from 405 nm, 465 nm, and 560 nm channels were low band-pass filtered to remove events exceeding 6 Hz. The isosbestic 405 nm channel was used to correct for bleaching and movement optical artifacts. Accordingly, any fluctuations occurring in the 405 nm isosbestic channel were removed from the 465 nm and 560 nm channels before analysis. For this purpose, the least-squares linear fit method 138 was applied to the isosbestic 405 nm signal to align it to the 465 nm signal (or 560 nm), producing a fitted 405 nm signal used to normalise the 465 nm as follows: 4F=F = ½465 nm signal À f itted 405 nm signal ð Þ =ðf itted 405 nm signalÞ. Then, to assess changes from baseline fluorescence signal after CS onset within trials, the baseline z-score of the 4F=F was calculated as follows: z score = 4F=F À μbaseline ð Þ Â Ã =σbaseline, where μ baseline is the mean of 4F=F values from baseline period (averaged signal collected 1s before CS onset) and σbaseline is the standard deviation of 4F=F values from baseline period.

Statistics, data collection, and analysis
Behavioural data was extracted from ABET II Video Touch software V21.02.26 (Lafayette Instruments). The data supporting these findings can be visualised and are freely available at MouseBytes (https:// mousebytes.ca/home). Microscopy images were processed using ImageJ 64-bit V1.8 (NIH). The generation of heatmaps, estimation of area under the curve (AUC), and height peak analysis of events were obtained using OriginPro 2021 V9.8.0.200 (OriginLab Corporation, Northampton, MA). Briefly, heatmaps illustrating 4F=F or z-score from GRAB DA , ACh3.0, or calcium signalling consisted of individual trials, or averaged trials (20 trials/session). Each began with 1 s long baseline before CS onset, followed by 10 s long CS+ or CS-presentation followed by the delivery of a single reward (7 μl) after CS+ offset. DA and calcium signal during CS presentation was calculated by averaging the signal during CS presentation. To find peaks, calculate height, and to integrate the AUC, the averaged trial signal from each session was used to find and calculate peaks (curve) within the ROI, obtaining the baseline by estimation of end points weighted (15%) within the window search. Alternatively, for the estimation of number of peaks during validation experiments recording calcium (Supp. Fig.8d), the 1st derivative signal within the 10 min window search (saline vs. cocaine) was applied followed by the Savitzky-Golay method to smooth the signal. To calculate the AUC and height peak of the DA signal after reward delivery, the window search was set for the following 5 s after CS+ offset. The AUC and height of the peak of the reward response during ACh recordings were estimated by setting the window search for the 10 s after CS+ offset. Alternatively, the mean signal post-reward delivery for the calcium recordings in D1-SPN and D2-SPN was calculated by averaging the signal during the 10 s immediately post CS offset. All data were imported to GraphPad Prism V9.3.1 for Windows 10 (GraphPad Software, San Diego, CA) for statistical analysis. Initially, a normality test (D'Agostino-Pearson) was performed to determine whether parametric or non-parametric tests were appropriate. No assumptions or corrections were made prior to data analysis. Differences between two groups were always examined using a two-tailed Student's t-test, where p < 0.05 was considered significant and p > 0.05 was considered non-significant. Comparisons between multiple groups were performed using analysis of variance (ANOVA; one-way and two-way with repeated measures), followed by Sidak's multiple comparisons test. Alternatively, when a dataset had missing values, we compared groups using a linear mixed-effects ANOVA model. A simple linear regression analysis was used to correlate DA and ACh signalling with approaches to the CS presentation. Estimation of sample size in Autoshaping task experiments was estimated using a partial eta squared (η p 2 ) power analysis for repeated measures two-way ANOVA (power = 0.9, ɑ = 0.05) 139 . For experiments combining fibre photometry and touchscreens, we used standard sample sizes (N = 7-11) as previously reported 87,93 . All cell counting, ISH, and immunohistochemistry experiments were performed by an experimenter blind to the experimental condition. Estimation of sample size was estimated based on previous reports. All data were expressed as mean ± s.e.m. and p < 0.05 were considered as statistically significant.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The datasets generated and analysed in this study can be visualised and are freely accessible in the Mousebytes repository, https:// mousebytes.ca/home, and complementary Mousebytes repository, https://mousebytes.ca/comp-edit?repolinkguid=ccf27660-6442-4c90-a14c-cdbc663a7b72. Additionally, generated data from all figures are provided in their corresponding Source Data files. Source data are provided with this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.